{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 数学基础\n",
    "\n",
    "本节总结了本书中涉及的有关线性代数、微分和概率的基础知识。为避免赘述本书未涉及的数学背景知识，本节中的少数定义稍有简化。\n",
    "\n",
    "\n",
    "## 线性代数\n",
    "\n",
    "下面分别概括了向量、矩阵、运算、范数、特征向量和特征值的概念。\n",
    "\n",
    "### 向量\n",
    "\n",
    "本书中的向量指的是列向量。一个$n$维向量$\\boldsymbol{x}$的表达式可写成\n",
    "\n",
    "$$\n",
    "\\boldsymbol{x} = \n",
    "\\begin{bmatrix}\n",
    "    x_{1}  \\\\\n",
    "    x_{2}  \\\\\n",
    "    \\vdots  \\\\\n",
    "    x_{n} \n",
    "\\end{bmatrix},\n",
    "$$\n",
    "\n",
    "其中$x_1, \\ldots, x_n$是向量的元素。我们将各元素均为实数的$n$维向量$\\boldsymbol{x}$记作$\\boldsymbol{x} \\in \\mathbb{R}^{n}$或$\\boldsymbol{x} \\in \\mathbb{R}^{n \\times 1}$。\n",
    "\n",
    "\n",
    "### 矩阵\n",
    "\n",
    "一个$m$行$n$列矩阵的表达式可写成\n",
    "\n",
    "$$\n",
    "\\boldsymbol{X} = \n",
    "\\begin{bmatrix}\n",
    "    x_{11} & x_{12}  & \\dots  & x_{1n} \\\\\n",
    "    x_{21} & x_{22}  & \\dots  & x_{2n} \\\\\n",
    "    \\vdots & \\vdots  & \\ddots & \\vdots \\\\\n",
    "    x_{m1} & x_{m2}  & \\dots  & x_{mn}\n",
    "\\end{bmatrix},\n",
    "$$\n",
    "\n",
    "其中$x_{ij}$是矩阵$\\boldsymbol{X}$中第$i$行第$j$列的元素（$1 \\leq i \\leq m, 1 \\leq j \\leq n$）。我们将各元素均为实数的$m$行$n$列矩阵$\\boldsymbol{X}$记作$\\boldsymbol{X} \\in \\mathbb{R}^{m \\times n}$。不难发现，向量是特殊的矩阵。\n",
    "\n",
    "\n",
    "### 运算\n",
    "\n",
    "设$n$维向量$\\boldsymbol{a}$中的元素为$a_1, \\ldots, a_n$，$n$维向量$\\boldsymbol{b}$中的元素为$b_1, \\ldots, b_n$。向量$\\boldsymbol{a}$与$\\boldsymbol{b}$的点乘（内积）是一个标量：\n",
    "\n",
    "$$\\boldsymbol{a} \\cdot \\boldsymbol{b} = a_1 b_1 + \\ldots + a_n b_n.$$\n",
    "\n",
    "\n",
    "设两个$m$行$n$列矩阵\n",
    "\n",
    "$$\n",
    "\\boldsymbol{A} = \n",
    "\\begin{bmatrix}\n",
    "    a_{11} & a_{12} & \\dots  & a_{1n} \\\\\n",
    "    a_{21} & a_{22} & \\dots  & a_{2n} \\\\\n",
    "    \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
    "    a_{m1} & a_{m2} & \\dots  & a_{mn}\n",
    "\\end{bmatrix},\\quad\n",
    "\\boldsymbol{B} = \n",
    "\\begin{bmatrix}\n",
    "    b_{11} & b_{12} & \\dots  & b_{1n} \\\\\n",
    "    b_{21} & b_{22} & \\dots  & b_{2n} \\\\\n",
    "    \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
    "    b_{m1} & b_{m2} & \\dots  & b_{mn}\n",
    "\\end{bmatrix}.\n",
    "$$\n",
    "\n",
    "矩阵$\\boldsymbol{A}$的转置是一个$n$行$m$列矩阵，它的每一行其实是原矩阵的每一列：\n",
    "$$\n",
    "\\boldsymbol{A}^\\top = \n",
    "\\begin{bmatrix}\n",
    "    a_{11} & a_{21} & \\dots  & a_{m1} \\\\\n",
    "    a_{12} & a_{22} & \\dots  & a_{m2} \\\\\n",
    "    \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
    "    a_{1n} & a_{2n} & \\dots  & a_{mn}\n",
    "\\end{bmatrix}.\n",
    "$$\n",
    "\n",
    "\n",
    "两个相同形状的矩阵的加法是将两个矩阵按元素做加法：\n",
    "\n",
    "$$\n",
    "\\boldsymbol{A} + \\boldsymbol{B} = \n",
    "\\begin{bmatrix}\n",
    "    a_{11} + b_{11} & a_{12} + b_{12} & \\dots  & a_{1n} + b_{1n} \\\\\n",
    "    a_{21} + b_{21} & a_{22} + b_{22} & \\dots  & a_{2n} + b_{2n} \\\\\n",
    "    \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
    "    a_{m1} + b_{m1} & a_{m2} + b_{m2} & \\dots  & a_{mn} + b_{mn}\n",
    "\\end{bmatrix}.\n",
    "$$\n",
    "\n",
    "我们使用符号$\\odot$表示两个矩阵按元素做乘法的运算：\n",
    "\n",
    "$$\n",
    "\\boldsymbol{A} \\odot \\boldsymbol{B} = \n",
    "\\begin{bmatrix}\n",
    "    a_{11}  b_{11} & a_{12}  b_{12} & \\dots  & a_{1n}  b_{1n} \\\\\n",
    "    a_{21}  b_{21} & a_{22}  b_{22} & \\dots  & a_{2n}  b_{2n} \\\\\n",
    "    \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
    "    a_{m1}  b_{m1} & a_{m2}  b_{m2} & \\dots  & a_{mn}  b_{mn}\n",
    "\\end{bmatrix}.\n",
    "$$\n",
    "\n",
    "定义一个标量$k$。标量与矩阵的乘法也是按元素做乘法的运算：\n",
    "\n",
    "\n",
    "$$\n",
    "k\\boldsymbol{A} = \n",
    "\\begin{bmatrix}\n",
    "    ka_{11} & ka_{12} & \\dots  & ka_{1n} \\\\\n",
    "    ka_{21} & ka_{22} & \\dots  & ka_{2n} \\\\\n",
    "    \\vdots & \\vdots   & \\ddots & \\vdots \\\\\n",
    "    ka_{m1} & ka_{m2} & \\dots  & ka_{mn}\n",
    "\\end{bmatrix}.\n",
    "$$\n",
    "\n",
    "其他诸如标量与矩阵按元素相加、相除等运算与上式中的相乘运算类似。矩阵按元素开根号、取对数等运算也就是对矩阵每个元素开根号、取对数等，并得到和原矩阵形状相同的矩阵。\n",
    "\n",
    "矩阵乘法和按元素的乘法不同。设$\\boldsymbol{A}$为$m$行$p$列的矩阵，$\\boldsymbol{B}$为$p$行$n$列的矩阵。两个矩阵相乘的结果\n",
    "\n",
    "$$\n",
    "\\boldsymbol{A} \\boldsymbol{B} = \n",
    "\\begin{bmatrix}\n",
    "    a_{11} & a_{12} & \\dots  & a_{1p} \\\\\n",
    "    a_{21} & a_{22} & \\dots  & a_{2p} \\\\\n",
    "    \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
    "    a_{i1} & a_{i2} & \\dots  & a_{ip} \\\\\n",
    "    \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
    "    a_{m1} & a_{m2} & \\dots  & a_{mp}\n",
    "\\end{bmatrix}\n",
    "\\begin{bmatrix}\n",
    "    b_{11} & b_{12} & \\dots  & b_{1j} & \\dots & b_{1n} \\\\\n",
    "    b_{21} & b_{22} & \\dots  & b_{2j} & \\dots  & b_{2n} \\\\\n",
    "    \\vdots & \\vdots & \\ddots & \\vdots & \\ddots & \\vdots \\\\\n",
    "    b_{p1} & b_{p2} & \\dots  & b_{pj} & \\dots  & b_{pn}\n",
    "\\end{bmatrix}\n",
    "$$\n",
    "\n",
    "是一个$m$行$n$列的矩阵，其中第$i$行第$j$列（$1 \\leq i \\leq m, 1 \\leq j \\leq n$）的元素为\n",
    "\n",
    "$$a_{i1}b_{1j}  + a_{i2}b_{2j} + \\ldots + a_{ip}b_{pj} = \\sum_{k=1}^p a_{ik}b_{kj}. $$\n",
    "\n",
    "\n",
    "### 范数\n",
    "\n",
    "设$n$维向量$\\boldsymbol{x}$中的元素为$x_1, \\ldots, x_n$。向量$\\boldsymbol{x}$的$L_p$范数为\n",
    "\n",
    "$$\\|\\boldsymbol{x}\\|_p = \\left(\\sum_{i=1}^n \\left|x_i \\right|^p \\right)^{1/p}.$$\n",
    "\n",
    "例如，$\\boldsymbol{x}$的$L_1$范数是该向量元素绝对值之和：\n",
    "\n",
    "$$\\|\\boldsymbol{x}\\|_1 = \\sum_{i=1}^n \\left|x_i \\right|.$$\n",
    "\n",
    "而$\\boldsymbol{x}$的$L_2$范数是该向量元素平方和的平方根：\n",
    "\n",
    "$$\\|\\boldsymbol{x}\\|_2 = \\sqrt{\\sum_{i=1}^n x_i^2}.$$\n",
    "\n",
    "我们通常用$\\|\\boldsymbol{x}\\|$指代$\\|\\boldsymbol{x}\\|_2$。\n",
    "\n",
    "设$\\boldsymbol{X}$是一个$m$行$n$列矩阵。矩阵$\\boldsymbol{X}$的Frobenius范数为该矩阵元素平方和的平方根：\n",
    "\n",
    "$$\\|\\boldsymbol{X}\\|_F = \\sqrt{\\sum_{i=1}^m \\sum_{j=1}^n x_{ij}^2},$$\n",
    "\n",
    "其中$x_{ij}$为矩阵$\\boldsymbol{X}$在第$i$行第$j$列的元素。\n",
    "\n",
    "\n",
    "### 特征向量和特征值\n",
    "\n",
    "\n",
    "对于一个$n$行$n$列的矩阵$\\boldsymbol{A}$，假设有标量$\\lambda$和非零的$n$维向量$\\boldsymbol{v}$使\n",
    "\n",
    "$$\\boldsymbol{A} \\boldsymbol{v} = \\lambda \\boldsymbol{v},$$\n",
    "\n",
    "那么$\\boldsymbol{v}$是矩阵$\\boldsymbol{A}$的一个特征向量，标量$\\lambda$是$\\boldsymbol{v}$对应的特征值。\n",
    "\n",
    "\n",
    "\n",
    "## 微分\n",
    "\n",
    "我们在这里简要介绍微分的一些基本概念和演算。\n",
    "\n",
    "\n",
    "### 导数和微分\n",
    "\n",
    "假设函数$f: \\mathbb{R} \\rightarrow \\mathbb{R}$的输入和输出都是标量。函数$f$的导数\n",
    "\n",
    "$$f'(x) = \\lim_{h \\rightarrow 0} \\frac{f(x+h) - f(x)}{h},$$\n",
    "\n",
    "且假定该极限存在。给定$y = f(x)$，其中$x$和$y$分别是函数$f$的自变量和因变量。以下有关导数和微分的表达式等价：\n",
    "\n",
    "$$f'(x) = y' = \\frac{\\text{d}y}{\\text{d}x} = \\frac{\\text{d}f}{\\text{d}x} = \\frac{\\text{d}}{\\text{d}x} f(x) = \\text{D}f(x) = \\text{D}_x f(x),$$\n",
    "\n",
    "其中符号$\\text{D}$和$\\text{d}/\\text{d}x$也叫微分运算符。常见的微分演算有$\\text{D}C = 0$（$C$为常数）、$\\text{D}x^n = nx^{n-1}$（$n$为常数）、$\\text{D}e^x = e^x$、$\\text{D}\\ln(x) = 1/x$等。\n",
    "\n",
    "如果函数$f$和$g$都可导，设$C$为常数，那么\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\frac{\\text{d}}{\\text{d}x} [Cf(x)] &= C \\frac{\\text{d}}{\\text{d}x} f(x),\\\\\n",
    "\\frac{\\text{d}}{\\text{d}x} [f(x) + g(x)] &= \\frac{\\text{d}}{\\text{d}x} f(x) + \\frac{\\text{d}}{\\text{d}x} g(x),\\\\ \n",
    "\\frac{\\text{d}}{\\text{d}x} [f(x)g(x)] &= f(x) \\frac{\\text{d}}{\\text{d}x} [g(x)] + g(x) \\frac{\\text{d}}{\\text{d}x} [f(x)],\\\\\n",
    "\\frac{\\text{d}}{\\text{d}x} \\left[\\frac{f(x)}{g(x)}\\right] &= \\frac{g(x) \\frac{\\text{d}}{\\text{d}x} [f(x)] - f(x) \\frac{\\text{d}}{\\text{d}x} [g(x)]}{[g(x)]^2}.\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "\n",
    "如果$y=f(u)$和$u=g(x)$都是可导函数，依据链式法则，\n",
    "\n",
    "$$\\frac{\\text{d}y}{\\text{d}x} = \\frac{\\text{d}y}{\\text{d}u} \\frac{\\text{d}u}{\\text{d}x}.$$\n",
    "\n",
    "\n",
    "### 泰勒展开\n",
    "\n",
    "函数$f$的泰勒展开式是\n",
    "\n",
    "$$f(x) = \\sum_{n=0}^\\infty \\frac{f^{(n)}(a)}{n!} (x-a)^n,$$\n",
    "\n",
    "其中$f^{(n)}$为函数$f$的$n$阶导数（求$n$次导数），$n!$为$n$的阶乘。假设$\\epsilon$是一个足够小的数，如果将上式中$x$和$a$分别替换成$x+\\epsilon$和$x$，可以得到\n",
    "\n",
    "$$f(x + \\epsilon) \\approx f(x) + f'(x) \\epsilon + \\mathcal{O}(\\epsilon^2).$$\n",
    "\n",
    "由于$\\epsilon$足够小，上式也可以简化成\n",
    "\n",
    "$$f(x + \\epsilon) \\approx f(x) + f'(x) \\epsilon.$$\n",
    "\n",
    "\n",
    "\n",
    "### 偏导数\n",
    "\n",
    "设$u$为一个有$n$个自变量的函数，$u = f(x_1, x_2, \\ldots, x_n)$，它有关第$i$个变量$x_i$的偏导数为\n",
    "\n",
    "$$ \\frac{\\partial u}{\\partial x_i} = \\lim_{h \\rightarrow 0} \\frac{f(x_1, \\ldots, x_{i-1}, x_i+h, x_{i+1}, \\ldots, x_n) - f(x_1, \\ldots, x_i, \\ldots, x_n)}{h}.$$\n",
    "\n",
    "\n",
    "以下有关偏导数的表达式等价：\n",
    "\n",
    "$$\\frac{\\partial u}{\\partial x_i} = \\frac{\\partial f}{\\partial x_i} = f_{x_i} = f_i = \\text{D}_i f = \\text{D}_{x_i} f.$$\n",
    "\n",
    "为了计算$\\partial u/\\partial x_i$，只需将$x_1, \\ldots, x_{i-1}, x_{i+1}, \\ldots, x_n$视为常数并求$u$有关$x_i$的导数。\n",
    "\n",
    "\n",
    "\n",
    "### 梯度\n",
    "\n",
    "\n",
    "假设函数$f: \\mathbb{R}^n \\rightarrow \\mathbb{R}$的输入是一个$n$维向量$\\boldsymbol{x} = [x_1, x_2, \\ldots, x_n]^\\top$，输出是标量。函数$f(\\boldsymbol{x})$有关$\\boldsymbol{x}$的梯度是一个由$n$个偏导数组成的向量：\n",
    "\n",
    "$$\\nabla_{\\boldsymbol{x}} f(\\boldsymbol{x}) = \\bigg[\\frac{\\partial f(\\boldsymbol{x})}{\\partial x_1}, \\frac{\\partial f(\\boldsymbol{x})}{\\partial x_2}, \\ldots, \\frac{\\partial f(\\boldsymbol{x})}{\\partial x_n}\\bigg]^\\top.$$\n",
    "\n",
    "\n",
    "为表示简洁，我们有时用$\\nabla f(\\boldsymbol{x})$代替$\\nabla_{\\boldsymbol{x}} f(\\boldsymbol{x})$。\n",
    "\n",
    "假设$\\boldsymbol{x}$是一个向量，常见的梯度演算包括\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\nabla_{\\boldsymbol{x}} \\boldsymbol{A}^\\top \\boldsymbol{x} &= \\boldsymbol{A}, \\\\\n",
    "\\nabla_{\\boldsymbol{x}} \\boldsymbol{x}^\\top \\boldsymbol{A}  &= \\boldsymbol{A}, \\\\\n",
    "\\nabla_{\\boldsymbol{x}} \\boldsymbol{x}^\\top \\boldsymbol{A} \\boldsymbol{x}  &= (\\boldsymbol{A} + \\boldsymbol{A}^\\top)\\boldsymbol{x},\\\\\n",
    "\\nabla_{\\boldsymbol{x}} \\|\\boldsymbol{x} \\|^2 &= \\nabla_{\\boldsymbol{x}} \\boldsymbol{x}^\\top \\boldsymbol{x} = 2\\boldsymbol{x}.\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "类似地，假设$\\boldsymbol{X}$是一个矩阵，那么\n",
    "$$\\nabla_{\\boldsymbol{X}} \\|\\boldsymbol{X} \\|_F^2 = 2\\boldsymbol{X}.$$\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "### 海森矩阵\n",
    "\n",
    "假设函数$f: \\mathbb{R}^n \\rightarrow \\mathbb{R}$的输入是一个$n$维向量$\\boldsymbol{x} = [x_1, x_2, \\ldots, x_n]^\\top$，输出是标量。假定函数$f$所有的二阶偏导数都存在，$f$的海森矩阵$\\boldsymbol{H}$是一个$n$行$n$列的矩阵：\n",
    "\n",
    "$$\n",
    "\\boldsymbol{H} = \n",
    "\\begin{bmatrix}\n",
    "    \\frac{\\partial^2 f}{\\partial x_1^2} & \\frac{\\partial^2 f}{\\partial x_1 \\partial x_2} & \\dots  & \\frac{\\partial^2 f}{\\partial x_1 \\partial x_n} \\\\\n",
    "    \\frac{\\partial^2 f}{\\partial x_2 \\partial x_1} & \\frac{\\partial^2 f}{\\partial x_2^2} & \\dots  & \\frac{\\partial^2 f}{\\partial x_2 \\partial x_n} \\\\\n",
    "    \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
    "    \\frac{\\partial^2 f}{\\partial x_n \\partial x_1} & \\frac{\\partial^2 f}{\\partial x_n \\partial x_2} & \\dots  & \\frac{\\partial^2 f}{\\partial x_n^2}\n",
    "\\end{bmatrix},\n",
    "$$\n",
    "\n",
    "其中二阶偏导数\n",
    "\n",
    "$$\\frac{\\partial^2 f}{\\partial x_i \\partial x_j} = \\frac{\\partial }{\\partial x_j} \\left(\\frac{\\partial f}{ \\partial x_i}\\right).$$\n",
    "\n",
    "\n",
    "\n",
    "## 概率\n",
    "\n",
    "最后，我们简要介绍条件概率、期望和均匀分布。\n",
    "\n",
    "### 条件概率\n",
    "\n",
    "假设事件$A$和事件$B$的概率分别为$P(A)$和$P(B)$，两个事件同时发生的概率记作$P(A \\cap B)$或$P(A, B)$。给定事件$B$，事件$A$的条件概率\n",
    "\n",
    "$$P(A \\mid B) = \\frac{P(A \\cap B)}{P(B)}.$$\n",
    "\n",
    "也就是说，\n",
    "\n",
    "$$P(A \\cap B) = P(B) P(A \\mid B) = P(A) P(B \\mid A).$$\n",
    "\n",
    "当满足\n",
    "\n",
    "$$P(A \\cap B) = P(A) P(B)$$\n",
    "\n",
    "时，事件$A$和事件$B$相互独立。\n",
    "\n",
    "\n",
    "### 期望\n",
    "\n",
    "离散的随机变量$X$的期望（或平均值）为\n",
    "\n",
    "$$E(X) = \\sum_{x} x P(X = x).$$\n",
    "\n",
    "\n",
    "\n",
    "### 均匀分布\n",
    "\n",
    "假设随机变量$X$服从$[a, b]$上的均匀分布，即$X \\sim U(a, b)$。随机变量$X$取$a$和$b$之间任意一个数的概率相等。\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "## 小结\n",
    "\n",
    "* 本节总结了本书中涉及的有关线性代数、微分和概率的基础知识。\n",
    "\n",
    "\n",
    "## 练习\n",
    "\n",
    "* 求函数$f(\\boldsymbol{x}) = 3x_1^2 + 5e^{x_2}$的梯度。\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "## 扫码直达[讨论区](https://discuss.gluon.ai/t/topic/6966)\n",
    "\n",
    "![](../img/qr_math.svg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:pytorch]",
   "language": "python",
   "name": "conda-env-pytorch-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
